Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sixth reconciliation PR from production/RRFS.v1 #896

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

grantfirl
Copy link
Collaborator

Description

This PR contains changes from 2 PRs that went into production/RRFS.v1:

Provides the changes needed to take advantage of FMS parallel IO changes. Changes courtesy of Dan Kokron of GDIT. Will be paired with changes in atmos_cubed_sphere

This PR adds a new diagnostic, instantaneous downward shortwave flux at the surface assuming clear-sky conditions.

Issue(s) addressed

None

Testing

Tested on Hera using full rt.conf. See logs in UWM PR: =

Dependencies

ufs-community/ccpp-physics#235
NOAA-GFDL/GFDL_atmos_cubed_sphere#365

Requirements before merging

  • All new code in this PR is tested by at least one unit test
  • All new code in this PR includes Doxygen documentation
  • All new code in this PR does not add new compilation warnings (check CI output)

@grantfirl
Copy link
Collaborator Author

grantfirl commented Jan 3, 2025

@MatthewPyle-NOAA @dkokron When running UFS regression tests with this code, the control_restart_p8_intel test is failing with a segmentation fault. If you have access to Hera, you can see the logs here: /scratch1/BMC/gmtb/Grant.Firl/stmp2/Grant.Firl/FV3_RT/rt_1304460/control_restart_p8_intel

The err log shows the first non-libarary error as:
0x0000000002232a28 fv_io_mod_mp_fv_io_read_restart_() /scratch1/BMC/gmtb/Grant.Firl/ufs-weather-model-grantfirl/FV3/atmos_cubed_sphere/tools/fv_io.F90:495

This seems to be related to the changes that were made in this PR. Could you please help to debug this?

@dkokron
Copy link
Contributor

dkokron commented Jan 3, 2025 via email

@grantfirl
Copy link
Collaborator Author

I don't have access to Hera. Does this failure happen on Acorn too?

On Fri, Jan 3, 2025 at 9:46 AM Grant Firl @.> wrote: @MatthewPyle-NOAA https://github.com/MatthewPyle-NOAA @dkokron https://github.com/dkokron When running UFS regression tests with this code, the control_restart_p8_intel test is failing with a segmentation fault. If you have access to Hera, you can see the logs here: /scratch1/BMC/gmtb/Grant.Firl/stmp2/Grant.Firl/FV3_RT/rt_1304460/control_restart_p8_intel The err log shows the first non-libarary error as: 0x0000000002232a28 fv_io_mod_mp_fv_io_read_restart_() /scratch1/BMC/gmtb/Grant.Firl/ufs-weather-model-grantfirl/FV3/atmos_cubed_sphere/tools/fv_io.F90:495 This seems to be related to the changes that were made in this PR. Could you please help to debug this? — Reply to this email directly, view it on GitHub <#896 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACODV2DOTNIIHTKZXNEHNNL2I2WGXAVCNFSM6AAAAABTCZ24DSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRZGQ2DENRTGI . You are receiving this because you were mentioned.Message ID: @.>

So far, it's been noted on Hera and Hercules. I'm guessing it would also show up on Acorn, but I don't have access.

@dkokron
Copy link
Contributor

dkokron commented Jan 3, 2025

@BrianCurtis-NOAA normally runs the UFS regression suite on acorn. Is he scheduled to do the same for this release?

@grantfirl
Copy link
Collaborator Author

@BrianCurtis-NOAA normally runs the UFS regression suite on acorn. Is he scheduled to do the same for this release?

This PR isn't necessarily associated with a release. It's just bringing changes from the RRFSv1 release branch into the main develop branch. Perhaps we can ask @BrianCurtis-NOAA to run the failing test only on Acorn for you to see if it fails there and you can have access to the run directory with logs?

@dkokron
Copy link
Contributor

dkokron commented Jan 3, 2025

I've run the UFS regression suite before. I was just wondering if Brian had done this already on acorn. Will the following clone get me to a state where I can reproduce the failure?

git clone --recursive [email protected]:ufs-community/ufs-weather-model.git CloneUFSForRegressionTesting

@grantfirl
Copy link
Collaborator Author

I've run the UFS regression suite before. I was just wondering if Brian had done this already on acorn. Will the following clone get me to a state where I can reproduce the failure?

git clone --recursive [email protected]:ufs-community/ufs-weather-model.git CloneUFSForRegressionTesting

I don't know if they had gotten to Acorn yet or not, but they stopped testing once this issue was noticed.

I'd use:
git clone --branch rrfsv1-to-ufs/dev6 --recursive https://github.com/grantfirl/ufs-weather-model.git
since this is the source branch for the PR. I'm sure that there are other ways.

@dkokron
Copy link
Contributor

dkokron commented Jan 3, 2025

Runs fine on acorn.
acorn:/lfs/h1/hpc/support/daniel.kokron/FV3_RT/rt_1846251/control_restart_p8_intel

Please try disabling the ENABLE_PARALLELRESTART feature in your Hera build.

line 39 of FV3/atmos_cubed_sphere/CMakeLists.txt
from
option(ENABLE_PARALLELRESTART "Enable collective parallel reads -DENABLE_PARALLELRESTART" ON)
to
option(ENABLE_PARALLELRESTART "Enable collective parallel reads -DENABLE_PARALLELRESTART" OFF)

@grantfirl
Copy link
Collaborator Author

Runs fine on acorn. acorn:/lfs/h1/hpc/support/daniel.kokron/FV3_RT/rt_1846251/control_restart_p8_intel

Please try disabling the ENABLE_PARALLELRESTART feature in your Hera build.

line 39 of FV3/atmos_cubed_sphere/CMakeLists.txt from option(ENABLE_PARALLELRESTART "Enable collective parallel reads -DENABLE_PARALLELRESTART" ON) to option(ENABLE_PARALLELRESTART "Enable collective parallel reads -DENABLE_PARALLELRESTART" OFF)

Interesting, thanks. I'll try your suggestion on Hera and report back on this PR.

@grantfirl
Copy link
Collaborator Author

@dkokron The offending test on Hera passes with -DENABLE_PARALLELRESTART" OFF.

@dkokron
Copy link
Contributor

dkokron commented Jan 3, 2025

@bensonr
Our agreement with the FMS team was that they would take ownership after they accepted the contributions that enable this feature. How do you want to proceed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants